= TactfulTokenizer

{<img src="https://badge.fury.io/rb/tactful_tokenizer.png" alt="Gem Version" />}[http://badge.fury.io/rb/tactful_tokenizer]
{<img src="https://travis-ci.org/zencephalon/Tactful_Tokenizer.png?branch=release" alt="Build Status" />}[https://travis-ci.org/zencephalon/Tactful_Tokenizer]
{<img src="https://codeclimate.com/github/zencephalon/Tactful_Tokenizer.png" />}[https://codeclimate.com/github/zencephalon/Tactful_Tokenizer]
{<img src="https://coveralls.io/repos/zencephalon/Tactful_Tokenizer/badge.png?branch=release" alt="Coverage Status" />}[https://coveralls.io/r/zencephalon/Tactful_Tokenizer?branch=release]

TactfulTokenizer is a Ruby library for high quality sentence
tokenization. It uses a Naive Bayesian statistical model, and
is based on Splitta[http://code.google.com/p/splitta/], but 
has support for '?' and '!' as well as primitive handling of 
XHTML markup. Better support for XHTML parsing is coming shortly.

Additionally supports unicode text tokenization.

== Usage

 require "tactful_tokenizer"
 m = TactfulTokenizer::Model.new
 m.tokenize_text("Here in the U.S. Senate we prefer to eat our friends. Is it easier that way? <em>Yes.</em> <em>Maybe</em>!")
 #=> ["Here in the U.S. Senate we prefer to eat our friends.", "Is it easier that way?", "<em>Yes.</em>", "<em>Maybe</em>!"]

The input text is expected to consist of paragraphs delimited
by line breaks.

== Installation
  gem install tactful_tokenizer

== Author

Copyright (c) 2010 Matthew Bunday. All rights reserved.
Released under the {GNU GPL v3}[http://www.gnu.org/licenses/gpl.html].
